Skip to content

fix(core): require line-anchored frontmatter fences in file_utils#973

Merged
groksrc merged 2 commits into
mainfrom
fix/972-line-anchored-frontmatter-fences
Jun 11, 2026
Merged

fix(core): require line-anchored frontmatter fences in file_utils#973
groksrc merged 2 commits into
mainfrom
fix/972-line-anchored-frontmatter-fences

Conversation

@groksrc

@groksrc groksrc commented Jun 11, 2026

Copy link
Copy Markdown
Member

Closes #972

Problem

has_frontmatter(), parse_frontmatter(), and remove_frontmatter() in src/basic_memory/file_utils.py detected frontmatter by substring/split (content.startswith("---") plus content.split("---", 2)) rather than by line-anchored fences.

Repro

A single-line string that merely starts with --- — where \n are literal backslash-n characters, a very common CLI/agent input shape:

bm tool write-note --title "Meeting Notes" --folder meetings \
  --content "---\nstatus: active\n---\nDiscussed Q3 roadmap with Anthony."

The CLI receives the literal one-line string ---\nstatus: active\n---\nDiscussed Q3 roadmap with Anthony.. The loose logic treated it as frontmatter:

  1. yaml parsed \nstatus as a key, which got merged into the note's YAML and written to disk.
  2. The body was silently transformed (the inline ---…--- segment was stripped).

Fix

Shape chosen: (b) anchor the parsing ourselves. A new shared _split_frontmatter() helper requires both fences on their own line (^---[ \t]*$); all three public helpers delegate to it, so detection is consistent.

Why (b) over delegating to python-frontmatter: the existing helpers raise ParseError with specific messages that callers and tests assert on ("Content has no frontmatter", "Invalid frontmatter format", "Frontmatter must be a YAML dictionary", "Invalid YAML in frontmatter"). Anchoring in-place keeps the diff minimal and preserves every one of those messages plus BOM stripping, empty-frontmatter -> {}, and non-dict -> ParseError.

The helper skips leading blank lines before the opening fence, so dedented heredoc-style content (a string beginning with a newline) still parses exactly as before — this does not relax line-anchoring, since the single-line repro's first line is not a bare ---.

All callers were reviewed (markdown/utils.py merge path, entity_service, sync_service, file_service, batch_indexer); none relied on the loose substring behavior.

Tests

  • tests/utils/test_file_utils.py: line-anchored detection for the exact one-line repro, inline --- later in the first line, opening fence with no closing fence, and fences with trailing whitespace (valid); plus a pass-through regression confirming remove_frontmatter leaves the one-liner intact and parse_frontmatter raises rather than inventing a \nstatus key.
  • tests/mcp/test_tool_write_note.py: integration-level regression through write_note — the literal one-liner is stored verbatim as the body and no garbage key leaks into the generated YAML frontmatter.
  • All pre-existing frontmatter tests stay green.

Gates

  • uv run pytest tests/utils/test_file_utils.py tests/mcp/test_tool_write_note.py — 90 passed
  • caller suites (markdown, entity_service, sync, batch_indexer) — 159 passed
  • uv run ruff check / ruff format --check on changed files — clean
  • uv run ty check src — clean
  • 100% coverage on changed lines

🤖 Generated with Claude Code

groksrc and others added 2 commits June 11, 2026 12:51
has_frontmatter(), parse_frontmatter(), and remove_frontmatter() detected
frontmatter by substring/split (`content.startswith("---")` plus
`content.split("---", 2)`) rather than by line-anchored fences. A single-line
string that merely starts with `---` — e.g. `---\nstatus: active\n---\nBody`
where `\n` are literal backslash-n characters, a common CLI/agent input shape —
was misread as frontmatter: yaml parsed `\nstatus` as a key that got merged into
the note's YAML on disk, and the body was silently transformed.

Introduce a shared `_split_frontmatter()` helper that anchors both fences to
their own lines (`^---[ \t]*$`), tolerating leading blank lines so dedented
heredoc-style content still parses. All three public helpers now delegate to it,
preserving existing behavior for valid frontmatter (BOM stripping, empty -> {},
non-dict -> ParseError, and the existing ParseError messages callers assert on).

Closes #972

Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Drew Cain <groksrc@gmail.com>
Co-Authored-By: Claude <noreply@anthropic.com>
Signed-off-by: Drew Cain <groksrc@gmail.com>
@groksrc groksrc marked this pull request as ready for review June 11, 2026 18:15
@groksrc groksrc merged commit 3ce42de into main Jun 11, 2026
23 checks passed
@groksrc groksrc deleted the fix/972-line-anchored-frontmatter-fences branch June 11, 2026 18:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Frontmatter fence detection is not line-anchored: inline '---' in single-line content gets parsed as frontmatter and writes garbage YAML keys

1 participant